159 research outputs found

    Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++

    Get PDF
    Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments

    A fresh look at the evolution and diversification of photochemical reaction centers

    Get PDF
    In this review, I reexamine the origin and diversification of photochemical reaction centers based on the known phylogenetic relations of the core subunits, and with the aid of sequence and structural alignments. I show, for example, that the protein folds at the C-terminus of the D1 and D2 subunits of Photosystem II, which are essential for the coordination of the water-oxidizing complex, were already in place in the most ancestral Type II reaction center subunit. I then evaluate the evolution of reaction centers in the context of the rise and expansion of the different groups of bacteria based on recent large-scale phylogenetic analyses. I find that the Heliobacteriaceae family of Firmicutes appears to be the earliest branching of the known groups of phototrophic bacteria; however, the origin of photochemical reaction centers and chlorophyll synthesis cannot be placed in this group. Moreover, it becomes evident that the Acidobacteria and the Proteobacteria shared a more recent common phototrophic ancestor, and this is also likely for the Chloroflexi and the Cyanobacteria. Finally, I argue that the discrepancies among the phylogenies of the reaction center proteins, chlorophyll synthesis enzymes, and the species tree of bacteria are best explained if both types of photochemical reaction centers evolved before the diversification of the known phyla of phototrophic bacteria. The primordial phototrophic ancestor must have had both Type I and Type II reaction centers

    Sequencing and Bioinformatics-Based Analyses of the microRNA Transcriptome in Hepatitis B–Related Hepatocellular Carcinoma

    Get PDF
    MicroRNAs (miRNAs) participate in crucial biological processes, and it is now evident that miRNA alterations are involved in the progression of human cancers. Recent studies on miRNA profiling performed with cloning suggest that sequencing is useful for the detection of novel miRNAs, modifications, and precise compositions and that miRNA expression levels calculated by clone count are reproducible. Here we focus on sequencing of miRNA to obtain a comprehensive profile and characterization of these transcriptomes as they relate to human liver. Sequencing using 454 sequencing and conventional cloning from 22 pair of HCC and adjacent normal liver (ANL) and 3 HCC cell lines identified reliable reads of more than 314000 miRNAs from HCC and more than 268000 from ANL for registered human miRNAs. Computational bioinformatics identified 7 novel miRNAs with high conservation, 15 novel opposite miRNAs, and 3 novel antisense miRNAs. Moreover sequencing can detect miRNA modifications including adenosine-to-inosine editing in miR-376 families. Expression profiling using clone count analysis was used to identify miRNAs that are expressed aberrantly in liver cancer including miR-122, miR-21, and miR-34a. Furthermore, sequencing-based miRNA clustering, but not individual miRNA, detects high risk patients who have high potentials for early tumor recurrence after liver surgery (P = 0.006), and which is the only significant variable among pathological and clinical and variables (P = 0,022). We believe that the combination of sequencing and bioinformatics will accelerate the discovery of novel miRNAs and biomarkers involved in human liver cancer

    Prediction of Co-Receptor Usage of HIV-1 from Genotype

    Get PDF
    Human Immunodeficiency Virus 1 uses for entry into host cells a receptor (CD4) and one of two co-receptors (CCR5 or CXCR4). Recently, a new class of antiretroviral drugs has entered clinical practice that specifically bind to the co-receptor CCR5, and thus inhibit virus entry. Accurate prediction of the co-receptor used by the virus in the patient is important as it allows for personalized selection of effective drugs and prognosis of disease progression. We have investigated whether it is possible to predict co-receptor usage accurately by analyzing the amino acid sequence of the main determinant of co-receptor usage, i.e., the third variable loop V3 of the gp120 protein. We developed a two-level machine learning approach that in the first level considers two different properties important for protein-protein binding derived from structural models of V3 and V3 sequences. The second level combines the two predictions of the first level. The two-level method predicts usage of CXCR4 co-receptor for new V3 sequences within seconds, with an area under the ROC curve of 0.937±0.004. Moreover, it is relatively robust against insertions and deletions, which frequently occur in V3. The approach could help clinicians to find optimal personalized treatments, and it offers new insights into the molecular basis of co-receptor usage. For instance, it quantifies the importance for co-receptor usage of a pocket that probably is responsible for binding sulfated tyrosine

    Analysis of genetic systems using experimental evolution and whole-genome sequencing

    Get PDF
    The application of whole-genome sequencing to the study of microbial evolution promises to reveal the complex functional networks of mutations that underlie adaptation. A recent study of parallel evolution in populations of Escherichia coli shows how adaptation involves both functional changes to specific proteins as well as global changes in regulation

    Quantitative Deep Sequencing Reveals Dynamic HIV-1 Escape and Large Population Shifts during CCR5 Antagonist Therapy In Vivo

    Get PDF
    High-throughput sequencing platforms provide an approach for detecting rare HIV-1 variants and documenting more fully quasispecies diversity. We applied this technology to the V3 loop-coding region of env in samples collected from 4 chronically HIV-infected subjects in whom CCR5 antagonist (vicriviroc [VVC]) therapy failed. Between 25,000–140,000 amplified sequences were obtained per sample. Profound baseline V3 loop sequence heterogeneity existed; predicted CXCR4-using populations were identified in a largely CCR5-using population. The V3 loop forms associated with subsequent virologic failure, either through CXCR4 use or the emergence of high-level VVC resistance, were present as minor variants at 0.8–2.8% of baseline samples. Extreme, rapid shifts in population frequencies toward these forms occurred, and deep sequencing provided a detailed view of the rapid evolutionary impact of VVC selection. Greater V3 diversity was observed post-selection. This previously unreported degree of V3 loop sequence diversity has implications for viral pathogenesis, vaccine design, and the optimal use of HIV-1 CCR5 antagonists

    Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Eucalyptus </it>species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC) libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing.</p> <p>Results</p> <p>We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of <it>E. grandis </it>(clone BRASUZ1) digested with <it>Hind</it>III and <it>BstY</it>I, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb) to 157 Kb (Eg_Ba), very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest <it>via </it>hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the <it>E. grandis </it>chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes.</p> <p>Conclusions</p> <p>The two <it>E. grandis </it>BAC libraries described in this study represent an important milestone for the advancement of <it>Eucalyptus </it>genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×), contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in <it>Eucalyptus </it>and possibly in related species of <it>Myrtaceae</it>, including genome sequencing, gene isolation, functional and comparative genomics. Because they have been constructed using the same tree (<it>E. grandis </it>BRASUZ1) whose full genome is being sequenced, they should prove instrumental for assembly and gap filling of the upcoming <it>Eucalyptus </it>reference genome sequence.</p

    MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

    Get PDF
    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment

    Diversity and dynamics of rare and of resident bacterial populations in coastal sands

    Get PDF
    Coastal sands filter and accumulate organic and inorganic materials from the terrestrial and marine environment, and thus provide a high diversity of microbial niches. Sands of temperate climate zones represent a temporally and spatially highly dynamic marine environment characterized by strong physical mixing and seasonal variation. Yet little is known about the temporal fluctuations of resident and rare members of bacterial communities in this environment. By combining community fingerprinting via pyrosequencing of ribosomal genes with the characterization of multiple environmental parameters, we disentangled the effects of seasonality, environmental heterogeneity, sediment depth and biogeochemical gradients on the fluctuations of bacterial communities of marine sands. Surprisingly, only 3–5% of all bacterial types of a given depth zone were present at all times, but 50–80% of them belonged to the most abundant types in the data set. About 60–70% of the bacterial types consisted of tag sequences occurring only once over a period of 1 year. Most members of the rare biosphere did not become abundant at any time or at any sediment depth, but varied significantly with environmental parameters associated with nutritional stress. Despite the large proportion and turnover of rare organisms, the overall community patterns were driven by deterministic relationships associated with seasonal fluctuations in key biogeochemical parameters related to primary productivity. The maintenance of major biogeochemical functions throughout the observation period suggests that the small proportion of resident bacterial types in sands perform the key biogeochemical processes, with minimal effects from the rare fraction of the communities
    • …
    corecore